{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d15d8294-3328-4e07-ad16-8a03e9bbfdb9",
   "metadata": {},
   "source": [
    "# MY !FIRST LAB\n",
    "\n",
    "### Script will take a stackoverflow issue and summarize it as a technical tutorial. \n",
    "\n",
    "Example links to use: \n",
    "    \n",
    "https://stackoverflow.com/questions/14220321/how-do-i-return-the-response-from-an-asynchronous-call \n",
    "https://stackoverflow.com/questions/60174/how-can-i-prevent-sql-injection-in-php\n",
    "https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags\n",
    "\n",
    "*Note: Issues must be answered preferebly by a lot of users.*\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e2fd67f3-6441-4fee-b19c-7c91e6188348",
   "metadata": {},
   "outputs": [],
   "source": [
    "website = 'https://stackoverflow.com/questions/60174/how-can-i-prevent-sql-injection-in-php'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4e2a9393-7767-488e-a8bf-27c12dca35bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "\n",
    "import os\n",
    "import requests\n",
    "from dotenv import load_dotenv\n",
    "from bs4 import BeautifulSoup\n",
    "from IPython.display import Markdown, display\n",
    "from openai import OpenAI\n",
    "\n",
    "# If you get an error running this cell, then please head over to the troubleshooting notebook!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7b87cadb-d513-4303-baee-a37b6f938e4d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\n"
     ]
    }
   ],
   "source": [
    "# Load environment variables in a file callwebsite_content .env\n",
    "\n",
    "load_dotenv(override=True)\n",
    "api_key = os.getenv('OPENAI_API_KEY')\n",
    "\n",
    "# Check the key\n",
    "\n",
    "if not api_key:\n",
    "    print(\"No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!\")\n",
    "elif not api_key.startswith(\"sk-proj-\"):\n",
    "    print(\"An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook\")\n",
    "elif api_key.strip() != api_key:\n",
    "    print(\"An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook\")\n",
    "else:\n",
    "    print(\"API key found and looks good so far!\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "019974d9-f3ad-4a8a-b5f9-0a3719aea2d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "openai = OpenAI()\n",
    "\n",
    "# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.\n",
    "# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c5e793b2-6775-426a-a139-4848291d0463",
   "metadata": {},
   "outputs": [],
   "source": [
    "# A class to represent a Webpage\n",
    "# If you're not familiar with Classes, check out the \"Intermwebsite_contentiate Python\" notebook\n",
    "\n",
    "# Some websites newebsite_content you to use proper headers when fetching them:\n",
    "headers = {\n",
    " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
    "}\n",
    "\n",
    "class Website:\n",
    "\n",
    "    def __init__(self, url):\n",
    "        \"\"\"\n",
    "        Create this Website object from the given url using the BeautifulSoup library\n",
    "        \"\"\"\n",
    "        self.url = url\n",
    "        response = requests.get(url, headers=headers)\n",
    "        soup = BeautifulSoup(response.content, 'html.parser')\n",
    "        self.title = soup.title.string if soup.title else \"No title found\"\n",
    "        for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
    "            irrelevant.decompose()\n",
    "        self.text = soup.body.get_text(separator=\"\\n\", strip=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2ef960cf-6dc2-4cda-afb3-b38be12f4c97",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mysql - How can I prevent SQL injection in PHP? - Stack Overflow\n",
      "Skip to main content\n",
      "Stack Overflow\n",
      "About\n",
      "Products\n",
      "OverflowAI\n",
      "Stack Overflow for Teams\n",
      "Where developers & technologists share private knowledge with c\n"
     ]
    }
   ],
   "source": [
    "# Let's try one out. Change the website and add print statements to follow along.\n",
    "\n",
    "website_content = Website(website)\n",
    "print(website_content.title[:100])\n",
    "print(website_content.text[:150])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a478a0c-2c53-48ff-869c-4d08199931e1",
   "metadata": {},
   "source": [
    "## Types of prompts\n",
    "\n",
    "You may know this already - but if not, you will get very familiar with it!\n",
    "\n",
    "Models like GPT4o have been trained to receive instructions in a particular way.\n",
    "\n",
    "They expect to receive:\n",
    "\n",
    "**A system prompt** that tells them what task they are performing and what tone they should use\n",
    "\n",
    "**A user prompt** -- the conversation starter that they should reply to"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "268cb127-ec40-4016-9436-94a1ae10a1c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish.\"\n",
    "\n",
    "system_prompt = \"You are a technical writer that analyzes the contents of a stackoverflow website issue containing a question and answer \\\n",
    "and provides a summary in the form of a technical tutorial , ignoring text that might be navigation related. \\\n",
    "Respond in markdown.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "f0275b1b-7cfe-4f9d-abfa-7650d378da0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# A function that writes a User Prompt that asks for summaries of websites:\n",
    "\n",
    "def user_prompt_for(website):\n",
    "    user_prompt = f\"You are looking at a website titled {website.title}\"\n",
    "    user_prompt += f\"\"\" \n",
    "\n",
    "    You are looking at a website titled {website_content.title}\n",
    "\n",
    "    Create a technical tutorial baswebsite_content on the following Stack Overflow content:\n",
    "    \n",
    "    {website_content.text}\n",
    "\n",
    "\n",
    "    The tutorial should include an introduction, problem statement, solution steps, and conclusion.\n",
    "    Tutrial should be in markdown format.\n",
    "    \"\"\"\n",
    "    user_prompt += website.text\n",
    "    return user_prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "26448ec4-5c00-4204-baec-7df91d11ff2e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "You are looking at a website titled mysql - How can I prevent SQL injection in PHP? - Stack Overflow \n",
      "\n",
      "    You are looking at a website titled mysql - How can I prevent SQL injection in PHP? - Stack Overflow\n",
      "\n",
      "    Create a technical tutorial baswebsite_content on the following Stack Overflow content:\n",
      "\n",
      "    Skip to main content\n",
      "Stack Overflow\n",
      "About\n",
      "Products\n",
      "OverflowAI\n",
      "Stack Overflow for Teams\n",
      "Where developers & technologists share private knowledge with coworkers\n",
      "Advertising & Talent\n",
      "Reach devs & t\n"
     ]
    }
   ],
   "source": [
    "print(user_prompt_for(website_content)[:500])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea211b5f-28e1-4a86-8e52-c0b7677cadcc",
   "metadata": {},
   "source": [
    "## Messages\n",
    "\n",
    "The API from OpenAI expects to receive messages in a particular structure.\n",
    "Many of the other APIs share this structure:\n",
    "\n",
    "```\n",
    "[\n",
    "    {\"role\": \"system\", \"content\": \"system message goes here\"},\n",
    "    {\"role\": \"user\", \"content\": \"user message goes here\"}\n",
    "]\n",
    "\n",
    "To give you a preview, the next 2 cells make a rather simple call - we won't stretch the mighty GPT (yet!)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d06e8d78-ce4c-4b05-aa8e-17050c82bb47",
   "metadata": {},
   "source": [
    "## And now let's build useful messages for GPT-4o-mini, using a function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "0134dfa4-8299-48b5-b444-f2a8c3403c88",
   "metadata": {},
   "outputs": [],
   "source": [
    "# See how this function creates exactly the format above\n",
    "\n",
    "def messages_for(website):\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_prompt},\n",
    "        {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16f49d46-bf55-4c3e-928f-68fc0bf715b0",
   "metadata": {},
   "source": [
    "## Time to bring it together - the API for OpenAI is very simple!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "905b9919-aba7-45b5-ae65-81b3d1d78e34",
   "metadata": {},
   "outputs": [],
   "source": [
    "# And now: call the OpenAI API. You will get very familiar with this!\n",
    "\n",
    "def summarize(url):\n",
    "    website = Website(url)\n",
    "    response = openai.chat.completions.create(\n",
    "        model = \"gpt-4o-mini\",\n",
    "        messages = messages_for(website)\n",
    "    )\n",
    "    return response.choices[0].message.content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "3d926d59-450e-4609-92ba-2d6f244f1342",
   "metadata": {},
   "outputs": [],
   "source": [
    "# A function to display this nicely in the Jupyter output, using markdown\n",
    "\n",
    "def display_summary(url):\n",
    "    summary = summarize(url)\n",
    "    display(Markdown(summary))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "0a6970cc-bed8-4759-a312-3b81236c2f4e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```markdown\n",
       "# How to Prevent SQL Injection in PHP\n",
       "\n",
       "## Introduction\n",
       "SQL injection is a serious security vulnerability that can allow an attacker to interfere with the queries that your application makes to the database. By exploiting this vulnerability, an attacker can gain unauthorized access to sensitive data, manipulate data, and even execute administrative operations on the database. This tutorial will guide you on how to prevent SQL injection in your PHP applications through various best practices.\n",
       "\n",
       "## Problem Statement\n",
       "Consider the following PHP code that is vulnerable to SQL injection:\n",
       "\n",
       "```php\n",
       "$unsafe_variable = $_POST['user_input']; \n",
       "mysql_query(\"INSERT INTO `table` (`column`) VALUES ('$unsafe_variable')\");\n",
       "```\n",
       "\n",
       "If a user were to input something like `value'); DROP TABLE table;--`, the query would become:\n",
       "\n",
       "```sql\n",
       "INSERT INTO `table` (`column`) VALUES('value'); DROP TABLE table;--');\n",
       "```\n",
       "\n",
       "This inserts an unwanted SQL command leading to disastrous effects on the database.\n",
       "\n",
       "## Solution Steps\n",
       "\n",
       "### 1. Use Prepared Statements\n",
       "The best method to prevent SQL injection is to use prepared statements with parameterized queries. This separates SQL logic from data, ensuring that user input is treated as data, not executable code.\n",
       "\n",
       "#### Using PDO\n",
       "Here's how to use PDO in PHP:\n",
       "\n",
       "```php\n",
       "$dsn = 'mysql:dbname=dbtest;host=127.0.0.1;charset=utf8mb4';\n",
       "$dbConnection = new PDO($dsn, 'user', 'password');\n",
       "$dbConnection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);\n",
       "$dbConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);\n",
       "\n",
       "$stmt = $dbConnection->prepare('SELECT * FROM users WHERE name = :name');\n",
       "$stmt->execute(['name' => $name]);\n",
       "\n",
       "foreach ($stmt as $row) {\n",
       "    // Process row\n",
       "}\n",
       "```\n",
       "\n",
       "#### Using MySQLi\n",
       "If you're using MySQLi, the syntax is slightly different:\n",
       "\n",
       "```php\n",
       "$dbConnection = new mysqli('127.0.0.1', 'username', 'password', 'test');\n",
       "$dbConnection->set_charset('utf8mb4');\n",
       "\n",
       "$stmt = $dbConnection->prepare('SELECT * FROM users WHERE name = ?');\n",
       "$stmt->bind_param('s', $name); // 's' stands for string\n",
       "$stmt->execute();\n",
       "$result = $stmt->get_result();\n",
       "\n",
       "while ($row = $result->fetch_assoc()) {\n",
       "    // Process row\n",
       "}\n",
       "```\n",
       "\n",
       "### 2. Properly Configure the Database Connection\n",
       "When using PDO, ensure that emulated prepared statements are disabled. This is essential for real prepared statements to take effect.\n",
       "\n",
       "Example configuration:\n",
       "```php\n",
       "$dbConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);\n",
       "```\n",
       "\n",
       "### 3. Validate Input Data\n",
       "In addition to using prepared statements, you should validate and sanitize user inputs. Implementing whitelist validation can help by ensuring only expected values are processed.\n",
       "\n",
       "For example, if you expect a sorting direction:\n",
       "```php\n",
       "$dir = !empty($_GET['dir']) && $_GET['dir'] === 'DESC' ? 'DESC' : 'ASC';\n",
       "```\n",
       "\n",
       "### 4. Limit Database Permissions\n",
       "Restrict database user permissions to the minimum required for their role. For example, a user who only needs to read data should not have permissions to delete or alter it.\n",
       "\n",
       "```sql\n",
       "GRANT SELECT ON database TO 'username'@'localhost';\n",
       "```\n",
       "\n",
       "### 5. Regularly Update Your Codebase\n",
       "Keep libraries and the PHP version you are using up-to-date. Deprecated functions and libraries often contain vulnerabilities that can be exploited.\n",
       "\n",
       "## Conclusion\n",
       "Preventing SQL injection in PHP applications requires a proactive approach. Using prepared statements ensures user input is handled securely, while validating data and limiting permissions fortifies your application against potential attacks. By implementing these best practices, you can significantly reduce the risk of SQL injection vulnerabilities in your applications.\n",
       "\n",
       "For more in-depth information on SQL injection prevention techniques, consult the [OWASP SQL Injection Prevention Cheat Sheet](https://owasp.org/www-community/attacks/SQL_Injection).\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display_summary(website)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
